Skip to content

Databricks support #142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 15, 2024
Merged

Databricks support #142

merged 5 commits into from
Jul 15, 2024

Conversation

diveart
Copy link
Collaborator

@diveart diveart commented Jul 15, 2024

Goal: Allow to have a consistent demo across Snowflake and Databricks

This PR

  1. introduces secondary master branch replication job (to replicate master-databricks branch from master)
  2. adds a Databricks-based dbt profile (so that the project code can be executed against Databrics DWH, too)
  3. introduces pr/prod jobs for Databrics profile
  4. minimally adjusts Snowflake pr/prod jobs

How it is supposed to work

  1. PRs against the master branch will be executed in Snowflake; PRs against the master-databricks branch will be executed in Databricks.
  2. Code (all repo files) from the master branch will be automatically (now - manually, later - daily) replicated into the master-databricks branch.

This approach/implementation relies on the Custom Base Branch Datafold feature.

@diveart diveart requested review from leoebfolsom and vvkh July 15, 2024 13:10
Copy link

@datafold datafold bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Datafold to diff your data, see the downstream impact, then post the results back to this PR. Add data diffs →

Copy link

@datafold datafold bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

View CI Run Details →

Base branch Pull Request branch
master (7c3b79a) dbt-databricks-profile (c59511b)
⚠️ Datafold could not compare your changes to the commit (ce77cc4) in the master/main branch that your current branch was created from. Instead, Datafold has used artifacts from the most recent available commit (7c3b79a) from the master/main branch.To resolve this, please provide dbt artifacts for the commit ce77cc4 using the datafold-sdk by [following these instructions](https://docs.datafold.com/integrations/orchestrators/custom_integrations/#submit-dbt-artifacts).
Data Diffs of tables modified in this pull request: 5
  • No PK: 5
DEMO.CORE.SUBSCRIPTION__CREATED
⚠️ To perform a full Data Diff, please set the primary key for the table. Documentation →
master dbt-databricks-profi...
DIFFERENCES
  Total rows 200 202 +1.0%
View details →
 
Unchanged Attributes
Total columns 6
Schema changes 0

DEMO.CORE.FEATURE__USED
⚠️ To perform a full Data Diff, please set the primary key for the table. Documentation →
master dbt-databricks-profi...
DIFFERENCES
  Total rows 1,012 1,021 +0.9%
View details →
 
Unchanged Attributes
Total columns 3
Schema changes 0

DEMO.CORE.USER__CREATED
⚠️ To perform a full Data Diff, please set the primary key for the table. Documentation →
master dbt-databricks-profi...
DIFFERENCES
  Total rows 607 612 +0.8%
View details →
 
Unchanged Attributes
Total columns 7
Schema changes 0

DEMO.CORE.SIGNED__IN
⚠️ To perform a full Data Diff, please set the primary key for the table. Documentation →
master dbt-databricks-profi...
DIFFERENCES
  Total rows 656 662 +0.9%
View details →
 
Unchanged Attributes
Total columns 3
Schema changes 0

DEMO.CORE.ORG__CREATED
⚠️ To perform a full Data Diff, please set the primary key for the table. Documentation →
master dbt-databricks-profi...
DIFFERENCES
  Total rows 408 411 +0.7%
View details →
 
Unchanged Attributes
Total columns 5
Schema changes 0

Skipped Data Diffs of downstream tables: 5 Add "datafold:diff-all-downstream" label to this pull request to diff all affected tables
DEMO.CORE.dim__users (table) Run Data Diff →
DEMO.CORE.fct__monthly__financials (table) Run Data Diff →
DEMO.CORE.dim__orgs (table) Run Data Diff →
DEMO.CORE.sales__sync (table) Run Data Diff →
DEMO.CORE.fct__yearly__financials (table) Run Data Diff →

Copy link

@datafold datafold bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

View CI Run Details →

Base branch Pull Request branch
master (7c3b79a) dbt-databricks-profile (1f9564a)
⚠️ Datafold could not compare your changes to the commit (ce77cc4) in the master/main branch that your current branch was created from. Instead, Datafold has used artifacts from the most recent available commit (7c3b79a) from the master/main branch.To resolve this, please provide dbt artifacts for the commit ce77cc4 using the datafold-sdk by [following these instructions](https://docs.datafold.com/integrations/orchestrators/custom_integrations/#submit-dbt-artifacts).
Data Diffs of tables modified in this pull request: 5
  • No PK: 5
DEMO.CORE.USER__CREATED
⚠️ To perform a full Data Diff, please set the primary key for the table. Documentation →
master dbt-databricks-profi...
DIFFERENCES
  Total rows 607 612 +0.8%
View details →
 
Unchanged Attributes
Total columns 7
Schema changes 0

DEMO.CORE.SIGNED__IN
⚠️ To perform a full Data Diff, please set the primary key for the table. Documentation →
master dbt-databricks-profi...
DIFFERENCES
  Total rows 656 662 +0.9%
View details →
 
Unchanged Attributes
Total columns 3
Schema changes 0

DEMO.CORE.SUBSCRIPTION__CREATED
⚠️ To perform a full Data Diff, please set the primary key for the table. Documentation →
master dbt-databricks-profi...
DIFFERENCES
  Total rows 200 202 +1.0%
View details →
 
Unchanged Attributes
Total columns 6
Schema changes 0

DEMO.CORE.ORG__CREATED
⚠️ To perform a full Data Diff, please set the primary key for the table. Documentation →
master dbt-databricks-profi...
DIFFERENCES
  Total rows 408 411 +0.7%
View details →
 
Unchanged Attributes
Total columns 5
Schema changes 0

DEMO.CORE.FEATURE__USED
⚠️ To perform a full Data Diff, please set the primary key for the table. Documentation →
master dbt-databricks-profi...
DIFFERENCES
  Total rows 1,012 1,021 +0.9%
View details →
 
Unchanged Attributes
Total columns 3
Schema changes 0

Skipped Data Diffs of downstream tables: 5 Add "datafold:diff-all-downstream" label to this pull request to diff all affected tables
DEMO.CORE.dim__orgs (table) Run Data Diff →
DEMO.CORE.fct__yearly__financials (table) Run Data Diff →
DEMO.CORE.dim__users (table) Run Data Diff →
DEMO.CORE.sales__sync (table) Run Data Diff →
DEMO.CORE.fct__monthly__financials (table) Run Data Diff →

Copy link

@datafold datafold bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

View CI Run Details →

Base branch Pull Request branch
master (7c3b79a) dbt-databricks-profile (1dd746b)
⚠️ Datafold could not compare your changes to the commit (ce77cc4) in the master/main branch that your current branch was created from. Instead, Datafold has used artifacts from the most recent available commit (7c3b79a) from the master/main branch.To resolve this, please provide dbt artifacts for the commit ce77cc4 using the datafold-sdk by [following these instructions](https://docs.datafold.com/integrations/orchestrators/custom_integrations/#submit-dbt-artifacts).
Data Diffs of tables modified in this pull request: 5
  • No PK: 5
DEMO.CORE.ORG__CREATED
⚠️ To perform a full Data Diff, please set the primary key for the table. Documentation →
master dbt-databricks-profi...
DIFFERENCES
  Total rows 408 411 +0.7%
View details →
 
Unchanged Attributes
Total columns 5
Schema changes 0

DEMO.CORE.USER__CREATED
⚠️ To perform a full Data Diff, please set the primary key for the table. Documentation →
master dbt-databricks-profi...
DIFFERENCES
  Total rows 607 612 +0.8%
View details →
 
Unchanged Attributes
Total columns 7
Schema changes 0

DEMO.CORE.FEATURE__USED
⚠️ To perform a full Data Diff, please set the primary key for the table. Documentation →
master dbt-databricks-profi...
DIFFERENCES
  Total rows 1,012 1,021 +0.9%
View details →
 
Unchanged Attributes
Total columns 3
Schema changes 0

DEMO.CORE.SUBSCRIPTION__CREATED
⚠️ To perform a full Data Diff, please set the primary key for the table. Documentation →
master dbt-databricks-profi...
DIFFERENCES
  Total rows 200 202 +1.0%
View details →
 
Unchanged Attributes
Total columns 6
Schema changes 0

DEMO.CORE.SIGNED__IN
⚠️ To perform a full Data Diff, please set the primary key for the table. Documentation →
master dbt-databricks-profi...
DIFFERENCES
  Total rows 656 662 +0.9%
View details →
 
Unchanged Attributes
Total columns 3
Schema changes 0

Skipped Data Diffs of downstream tables: 5 Add "datafold:diff-all-downstream" label to this pull request to diff all affected tables
DEMO.CORE.dim__orgs (table) Run Data Diff →
DEMO.CORE.dim__users (table) Run Data Diff →
DEMO.CORE.sales__sync (table) Run Data Diff →
DEMO.CORE.fct__yearly__financials (table) Run Data Diff →
DEMO.CORE.fct__monthly__financials (table) Run Data Diff →

@diveart diveart merged commit 42f9d19 into master Jul 15, 2024
2 checks passed
@diveart diveart deleted the dbt-databricks-profile branch July 15, 2024 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant